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Abstract 

We model the Internet as a network of interconnected Autonomous Systems which self-organize 
under an absolute lack of centralized control. Our aim is to capture how the Internet evolves 
by reproducing the assembly that has led to its actual structure and, to this end, we propose a 
growing weighted network model driven by competition for resources and adaptation to maintain 
functionality in a demand and supply "equilibrium". On the demand side, we consider the envi- 
ronment, a pool of users which need to transfer information and ask for service. On the supply 
side, ASs compete to gain users, but to be able to provide service efficiently, they must adapt their 
bandwidth as a function of their size. Hence, the Internet is not modeled as an isolated system but 
the environment, in the form of a pool of users, is also a fundamental part which must be taken into 
account. ASs compete for users and big and small come up, so that not all ASs are identical. New 
connections between ASs are made or old ones are reinforced according to the adaptation needs. 
Thus, the evolution of the Internet can not be fully understood if just described as a technological 
isolated system. A socio-economic perspective must also be considered. 

PACS numbers: 89.20.Hh, 05.70.Ln, 87.23.Ge, 89.75.Hc 
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I. INTRODUCTION 



In an attempt to bring nearer theory and reality, many researchers working on the 
new and rapidly evolving science of complex networks have recently shifted focus from 
unweighted graphs to weighted networks. Commonly, interactions between elements in 
network-representable complex real systems -may they be communication systems, such as 
the Internet, or transportation infrastructures, social communities, biological or biochemical 
systems- are not of the same magnitude. It seems natural that the first more simple rep- 
resentations, where edges between pairs of vertices are quantified just as present or absent, 
give way to more complex ones, where edges are no longer in binary states but may stand 
for connections of different strength. 

Weight is just one of the relevant ingredients in bringing network modeling closer to 
reality. Others come from the fact that real systems are not static but evolve. As broadly 
recognized, growth and preferential attachment are also key issues at the core of a set of 
recent network models focusing on evolution under an statistical physics approach 

This models have been able to approximate some topological features 
observed in many real networks -specifically the small-world property or a power-law degree 
distribution- as a result of the organizing principles acting at each stage of the network 
formation process. Although a great step forward in the understanding of the laws that 
shape network evolution, these new degree driven models cannot describe other empirical 
properties. Further on, in order to achieve representations that closely match reality, it is 
necessary to uncover new mechanisms. 

Following this motivation, we believe that the general view of networks as isolated sys- 
tems, although possibly appropriate in some cases, must be changed if we want to describe 
in a proper way complex systems which not generate spontaneously but self-organize within 
a medium in order to perform a function. Many networks evolve in an environment to which 
they interact and which usually provides the clues to understand functionality. Therefore, 
rules defined on the basis of internal mechanisms alone, such as preferential attachment that 
acts internally at the local scale to connect nodes trough edges, are not enough. When 
analyzing the dynamics of network assembly, the interlock of its constituents with the envi- 
ronment cannot be systematically obviated. 

With the aim of approaching applicability, in this work we blend all ideas above to present 
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a growing network model in which both, nodes and links, are weighted Q| . The dynamical 
evolution is driven by exponential growth, competition for resources and adaptation to 
maintain functionality in a demand and supply "equilibrium", key mechanisms which may 
be relevant in a wide range of self-organizing systems, in particular those where functionality 
is tied to communication or traffic. The medium in which the network grows and to with it 
interacts is here represented by a pool of elements which, at the same time, provide resources 



to the constituents o 
case of the Internet 



the network and demand functionality, say for instance users in the 



121. 



Ill ] or passengers in the case of the world-wide airport network 
Competition is here understood as a struggle between network nodes for new resources and is 
modeled as a rich get richer (preferential attachment) process. For their part, this captured 
elements demand functionality so that nodes must adapt in order to perform efficiently. This 
adaptation translates into the creation of weighted links between nodes. 

In this work, we apply those ideas to the Internet. In the realm of complexity theory, 
the Internet is a paradigmatic example and significant efforts has been devoted to the de- 

n 

velopment of models which reproduce the to polog ical properties observed in its maps |TJj . 
Candidates run from topology generators |l3L [l4j to degree driven growing networks mod- 
els [15! Il^ or Highly Optimized Tolerance (HOT) models |l^ |. Some of them reproduce 
heavy-tailed degree distributions and small-world properties, but perform poorly when esti- 
mating correlations or other characteristic properties, such as the rich-club phenomenon or 
the k-core structure. By contrast, we will show that our model nicely reproduces an over- 
whelming number of observed topological features: the small-world property, the scale-free 
degree distribution P(k), high clustering coefficient Ck that shows a hierarchical structure, 
disassortative degree-degree correlations, quantified by means of the average nearest neigh- 
bors degree of nodes of degree k, k nn (k) jl8[, the scaling of the higher order loop structure 
recently analyzed in Q], the distributions of the betweenness centrality, P(b), and trian- 
gles passing through a node, P(T), and, finally, the k-core decomposition uncovering its 
hierarchical organization jsoL . 

We will consider the Internet evolution at the Autonomous System (AS) level. ASs 
are defined as independently administered domains which autonomously determine internal 



communications and routing policies [ll| and, as a first approximation, we can assign each 
AS to an Internet Service Provider (ISP). This level of description means a coarse grained 
representation of the Internet. Nevertheless, further detail is not necessary when aiming to 



3 



explain and predict the large-scale behavior. Thus, the network will be made up of ASs as 
nodes connected among them with links which can be of different strength or bandwidth. 
On the side of the environment modeling, we place hosts on the level as users. 

In the next sections we analyze the growth of the Internet over the last years. Then 
we present the model. Working in the continuum approximation, we find analytically the 
distribution of the sizes (in number of users) of ASs and the degree distribution. Then, we 
introduce an algorithm in order to simulate network assembly. At this stage, we also make a 
first attempt to the consideration of geographical constraints. Finally, the synthetic networks 
are compared to the real maps of the Internet through a variety of different measures. 



II. THE GROWTH OF THE INTERNET 



Let W(t) be the total number of users in the environmental pool at a given time t, 
measured as hosts. N(t) and E(t) stand for the number of ASs and edges among them 
in the network, respectively. Empirical measures for the growth in the number of users 
have been obtained from the Hobbes Internet Timeline 22j. The growth of the network is 
analyzed from AS maps collected by the Oregon route-views project, which has recorded data 
since November 1997 j^j]. According to those observations, shown in FigOJ we will assume 
exponential growths for these quantities, W(t) « Woe at , N(t) m Nqc^, and E{t) ~ E e St . 
These exponential growths, in turn, determine the scaling relations with the system size: 
W oc N a l^ E oc iV 5//3 and (k) oc N s ^-\ 

The rates of growth can be measured to be a = 0.036 ± 0.001, (3 = 0.0304 ± 0.0003, 
and 5 = 0.0330 ± 0.0002 (units are month -1 ), where a ^ 8 ^ (3. These three rates are 
quite close to each other but they are not equal. In fact, the inequality a > (3 must hold 
in order to preserve network functionality. When the number of users increases at a rate oc. 
there are two mechanisms capable to compensate the demand they represent: the creation 
of new nodes and the creation of new connections by nodes already present in the network. 
When both mechanisms take place simultaneously, the rate of growth of new nodes, (3, as 
well as the rate for the number of connections, 5, must necessarily be smaller than a. Any 
other situation would lead to an imbalance between demand and supply of service in the 
system. On the other hand, in a connected network, S must be equal or greater than (3. If 5 
equals (3 the average number of connections per node, or average degree, remains constant 
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Number of hosts x 10 




Months since November 1997 



FIG. 1: Temporal evolution of the number of hosts, autonomous systems and connections among 
them from November 1997 to May 2002. Solid lines are the best fit estimates. Each point for the 
number of ASs and connections is an average over one month of daily measurements. Error bars 
are of the order of the symbol size. 

in time, whereas it increases when 8 ?t (3. This increase could correspond to a demand per 
user which is not constant but grows in time, probably due to the increase of the power of 
computers over time and, as a consequence, to the ability to transfer bigger files or to use 
more demanding applications. 

III. THE MODEL 

We define our model according to the following rules: (i) At rate aW(t), new users join 
the system and choose node i according to some preference function, Ui({ujj(t)}), where 
ujj(t), j — 1, • • • , N(t), is the number of users already connected to node j at time t. The 
function Ui({ujj(t)}) is normalized so that Yli ^({^-(i)}) = 1 at any time, (ii) At rate 
/3N(t), new nodes join the network with an initial number of users, u>o, randomly withdrawn 
from the pool of users already attached to existing nodes. Therefore, Uq can be understood 
as the minimum number of users required to keep nodes in business, (iii) At rate A, each 
user changes his AS and chooses a new one using the same preference function Ui({ujj(t)}). 
Finally, (iv) each node tries to adapt its number of connections to other nodes according to 
its present number of users or size, in an attempt to provide them an adequate functionality. 

With all specifications above, we will work in the continuum approximation to find some 
analytic results, specifically the distribution of the sizes of ASs and the degree distribution. 
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A. Analytic results 



The resource dynamics of single nodes is described by the following stochastic differential 
equation 

^ = 4ki) + [DKt)] 1/2 e(i), (1) 

where uoi is the number of users attached to AS % at time t. The time dependent drift is 
A(ui, t) = (a + X)W(t)Yii - Xuji - /3ou , and the diffusion term is D(u>i, t) = (a + X)W(t)Ui + 
XuJi + (3ujq — 2\u>illi. Application of the Central Limit Theorem guaranties the convergence 
of the noise to a gaussian white noise in the limit W(t) ^> 1. The first term in the 
expression for the drift is a creation term accounting for new and old users that choose node 
i. The second term represent those users who decide to change their node and, finally, the 
last term corresponds to the decrease of users due to introduction of newly created nodes. To 
proceed further, we need to specify the preference function Hi({u)j(t)}). We assume that, as 
a result of a competition process, nodes bigger in resources get users more easily than small 
ones. The simplest function satisfying this condition corresponds to the linear preference, 
that is, Hi({uj(t)}) = u>i/W(t), where W(t) = uj N exp (at) . In this case, the stochastic 
differential equation (JTJ) reads 

^ = acu l - Pco + [(a + 2AV, + /3o; ] 1/2 £(*). (2) 

Notice that reallocation of users (i.e. the A-term) only increases the diffusive part in 
Eq. (J2J but has no net effect in the drift term, which is, eventually, the leading term. 
The complete solution of this problem requires to solve the Fokker-Planck equation corre- 
sponding to Eq. (J2J with a reflecting boundary condition at uj = ujq and initial conditions 
p(ui,ti\uJo,ti) = 5(oJi — ujq) (5(-) stands for the Dirac delta function). Here p(u>i, t\u> , tj) is 
the probability that node i has a number of users cjj at time t given that it had c^o at time 
t{. The choice of a reflecting boundary condition at uj = ujq is equivalent to assume that (3 
is the overall growth rate of the number of nodes, that is, the composition of the birth and 
dead processes ruling the evolution of the number of nodes. 

Finding the solution for this problem is not an easy task. Fortunately, we can take 
advantage of the fact that, when a > j3, the average number of users of each node increases 
exponentially and, since D(ui,t) = O (A(uji, £)), fluctuations vanishes in the long time limit. 
Under this zero noise approximation, the number of users connected to a node introduced 
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at time U is 



Ui(t\U) 



■w + (1 



)(jj e 



,a{t-U) 



(3) 



a 



a 



The probability density function of to can be calculated in the long time limit as 



p(u,t) 




I 



e mi 5(uj - u3i{t\ti))dti 



(4) 



which leads to 



p(u,t) 



{uj — TUo) 1+1 



(5) 



where we have defined r = (3/ot and the cut-off is given by iv c (t) ~ (1 — T)co e at ~ W(t). 
Thus, in the long time limit, p(ou,t) approaches a stationary distribution with an increasing 
cut-off that scales linearly with the total number of users. The exponent r depends on the 
relative values of (5 and a, which can be different but typically would stay close so that r 
would value around 2. 

The key point now is to construct a bridge between the competition and the adaptation 
mechanisms, in other words, to see how to relate the number of users attached to an AS with 
its degree and bandwidth. Our basic assumption is that vertices are continuously adapting 
their strength or bandwidth, the total weight of its connections, to the number of users they 
have. However, once a node decides to increase its bandwidth it has to find a peer who, at 
the same time, wants to increase its bandwidth as well. The reason is that connection costs 
among nodes must be assumed by both peers. This fact differs from other growing models in 
which vertices do not ask target vertices if they really want to form those connections. Our 
model is, then, to be thought of as a coupling between a competition process for users and 
adaptation of vertices to their current situation, with the constraint that connections are only 
formed between "active" nodes, that is, those ASs with a positive increment of their number 
of users. Let bi(t\ti) be the total bandwidth of a node at time t given that it was introduced at 
time ti. This quantity can include single connections with other nodes, i. e. the topological 
degree k, but it also accounts for connections which have higher capacity. This is equivalent 
to say that the network is, in fact, weighted and foj is the weighted degree. To simplify the 
model we consider that bandwidth is discretized in such a way that single connections with 
high capacity are equivalent to multiple connections between the same nodes. Then, when 
a pair of nodes agrees to increase their mutual connectivity the connection is newly formed 
if they were not previously connected or, if they were, their mutual bandwidth increases by 
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one unit, reinforcing in this way their connectivity. Now, we assume that, at time t, each 
node adapts its total bandwidth proportionally to its number of users, or size, following a 
lineal relation. Thus, we can write 



bi(t\U) = l + a(t)(ui(t\ti)-u Q ). (6) 

Summing Eq. © for all nodes we get a(t) = (2B(t)-N(t))/(W(t)-u N(t)) « 2B(t)/W(t), 
where B(t) is the total bandwidth of the network. B(t) is, obviously, an upper bound to 
the total number of edges of the network. This suggests that B(t) will grow according to 
B(t) = B e 5 ' 1 . As the number of users grows, the global traffic of the Internet also grows, 
which means that nodes do not only adapt their bandwidth to their number of users but 
to the global traffic of the network. Therefore, a(t) must be an increasing function of t, 
which, in turn, implies that 5' > a and, thus, 5' > 5. As a consequence, the network 
must necessarily contain multiple connections. This can be explicitly seen by inspecting 
the scaling of the maximum bandwidth, which reads b c (t) oc N(t) s ^, that is, faster than 
N(t). Therefore, the topological degree of a node cannot be proportional to its bandwidth. 
Nevertheless, it is clear that ki and 6j are positive correlated random variables. We then 
propose that degree and bandwidth are related, in a statistical sense, through the following 
scaling relation 

k(t\t i ) = [b(t\t i )] fl , /i<l, (7) 

their size. This 
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which implies that all nodes can form multiple connections, regardless o: 
scaling behavior has recently been observed in other weighted networks 
perlinear behavior of b c (t), combined with this scaling relation, ensures that rich ASs will 
connect to a macroscopic portion of the system, so that the maximum degree will scale lin- 
early with the system size. Empirical measurements made in j^j showed such linear scaling 
in the AS with the largest degree. This sets the scaling exponent to /x = (3/8' . 

All four growth rates in the model are not independent but can be related by exploring 
the interplay between bandwidth, connectivity, and traffic of the network. Summing Eq. (J7J) 
for all vertices, the scaling of the total number of connections is E(t) oc N(t) 2 ~ a ' 5 , which 
leads to 5' = a/3/ (2/3 — 5). Combining this relation with Eqs. (jHJ), (jHJ) and 0, the degree 
distribution reads 

P(k) M ^Mie Wl) - *) (8) 



s 



for k 3> 1, where the exponent 7 takes the value 7 = 1 + 1/(2 — 5/(3). Strikingly, the 
exponent 7 has lost any direct dependence on a becoming a function of the ratio 5/(3. Using 
the empirical values for (3 and 5, the predicted ex pon ent is 7 = 2.2 ± 0.1, in excellent 
agreement with the values reported in the literature jlijl. l^. Of course, this does not mean 
that the exponent 7 is independent of a, since both, (3 and 5, may depend on the growth 
of the number of users. Anyway, our model turns out to depend on just two independent 
parameters which can be expressed as ratios of the rates of growth, (3 /a and 5/(3. 



B. Simulations 



So far, we have been mainly interested in the degree distribution of the AS map but 
not in the specific way in which the network is formed. To fill this gap we have performed 
numerical simulations that generate network topologies in nice agreement with real measures 
of the Internet. Although ASs are distributed systems, we assume they follow the same 
spatial distribution as the one measured for routers, so that we are able to define a physical 
distance among them to take into account connection costs |9j. Our algorithm, following 
the lines of the model, works in four steps: 

1. At iteration t, AW(t) = tuoN (e at — e Q ( t-1 )) users join the network and choose provider 
among the existing nodes using the linear preference rule. 

2. AN(t) = Nq^ 1 — e^* -1 )) new ASs are introduced with ojq users each, those being 
randomly withdrawn from already existing ASs. Newly created ASs are located in a 
two dimensional plane following a fractal set of dimension Df = 1.5 P|. 

3. Each AS evaluate its increase of bandwidth, A6j(t|tj), according to Eq. (0). 

4. A pair of nodes, (i, j), is chosen with probability proportional to Abi(t\ti) and Abj(t\tj) 
respectively, and, whenever they both need to increase their bandwidth, they form a 
connection with probability D(dij, u>i, ujj). This function takes into consideration that, 
due to connection costs, physical links over long distances are unlikely to be created by 
small peers. Once the first connection has been formed, they create a new connection 
with probability r, whenever they still need to increase their bandwidth. This step is 
repeated until all nodes have the desired bandwidth. 
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It is important to stress the fact that nodes must be chosen with probability proportional 
to their increase in bandwidth at each step. The reason is that those nodes that need 
a high bandwidth increase will be more active when looking for partners to whom form 
connections. Another important point is the role of the parameter r. This parameter takes 
into account the balance between the costs of forming connections with new peers and the 
need for diversification in the number of partners. The effect of r in the network topology 
is to tune the average degree and the clustering coefficient by modulating the number of 
multiple connections. The exponent 7 is unaffected except in the limiting case r — > 1. In this 
situation, big peers will create a huge amount of multiple connections among them, reducing, 
thus, the maximum degree of the network. Finally, we chose an exponential form for the 
distance probability function D(d i j,uj i ,Uj) = e - d ij/ d A^i^j) ) where d c (uji,ujj) = uj,iujj/ 'wW(t) 
and k is a cost function of number of users per unit distance, depending on the maximum 
distance of the fractal set. All simulations are performed using u = 5000, N = 2, B = 1, 
a = 0.035, p = 0.03, and 8' = 0.04. The final size of the networks is N « 11000, which 
approximately corresponds to the size of the actual maps for 2001 that we are considering 
in this work. 



IV. TESTING THE MODEL 

To test the model we construct synthetic networks from our algorithm with and with- 
out taking into consideration the geographical distribution of ASs, and we contrast several 

y, the AS map dated May 



23j, and the AS extended 



measures on those graphs to those of real maps, more specifical 
2001 from data collected by the Oregon Route Views Project 

(AS+) map |26( which completes the previous one with data from other sources. Let us 
note that all the measures presented here are performed over the same synthetic networks. 
The parameters of the model are fixed once and for all before generating the networks so 
that they are not tuned in order to approach different properties. 

First, we analyze a first category of measures which include the features of traditional 
interest when aiming to reproduce the Internet topology. The small world effect becomes 
clear when analyzing the distribution of the shortest path lengths, as seen in the left side 
graph of Fig. |2l with an average shortest path length very close to the real one. The graph 
on the right of Fig. 121 shows simulation results for the cumulative degree distribution, in 
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a Model with distance 
- Internet AS+ map 




FIG. 2: Distribution of the shortest path lengths (left) and cumulative degree distribution {P c (k) = 
Sfc'>fc ( r ight) for the extended AS map compared to simulations of the model, r = 0.8. Inset 

(right): Simulation results of the AS's degree as a function of AS's bandwidth. The solid line stands 
for the scaling relation Eq. (J7J) with \i = (3/8 1 = 0.75. 

nice agreement to that measured for the AS+ map. The inset exhibits simulation results of 
the AS's degree as a function of the AS's bandwidth, confirming the scaling ansatz Eq. ((7J). 
Clustering coefficient and average nearest neighbors degree are showed in Fig. El Dashed 
lines result from the model without distance constraints, whereas squares correspond to the 
model with distance constraints. Interestingly, the high level of clustering coming out from 
the model arises as a consequence of the pattern followed to attach nodes, so that only 
those AS willing for new connections will link. As can be observed in the figures, distance 
constraints introduce a disassortative component by inhibiting connections between small 
ASs so that the hierarchical structure of the real network is better reproduced. 

Now, we turn our attention to new measures, which run from the scaling of higher orders 
loops to the k-core structure. Not only two-point correlations are well approximated by our 
model, but it is also able to reproduce the scaling behavior of the number of loops of size 3, 4 
and 5. This has been recently measured for the Internet at the AS level in and it is seen 
to follow a power of the system size of the form N^{N) ~ N^ h \ with exponents that are 
closely reproduced by our synthetic networks, see Fig. |U and table |U In Fig. El we observe 
on the left the cumulative distribution of betweenness centrality as proposed by Freeman 



27j, a measure of the varying importance of the vertices in a network. On the right, the 



cumulative distribution of triangles passing by a node (for a discussion of the relevance of 
P(T) see, for instance, 
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FIG. 3: Clustering coefficient, Cfe, (left), and normalized average nearest neighbors degree, 
k nn {k){k) /{k 2 ), (right), as functions of the node's degree for the extended autonomous system 
map (circles) and for the model with and without distance constraints (red squares and dashed 
line, respectively). 

TABLE I: Values for the exponents £(h) for h = 3, 4, and 5 for the Internet and the models with 

and without distance constraints (after Bianconi et al. [lsl). 

System g(3) g(4) g(5) 

Internet AS map 1.45 ± 0.07 2.07 ± 0.01 2.45 ± 0.08 

Model with distance 1.60 ± 0.01 2.20 ± 0.03 2.70 ± 0.03 
Model without distance 1.59 ± 0.03 2.11 ± 0.03 2.64 ± 0.03 



Finally, we also show the k-core decomposition of the actual and the synthetic maps. 
The k-core decomposition is a recursive reduction of the network as a function of the degree, 
which allows the recognition of hierarchical structure and more central nodes 2oj | . A very 
good agreement between real measures and our models can be appreciated in FigJHl In 
the case of the model with distance constraints, even the coreness, the maximum number 
of layers in the fc-core decomposition, is almost the same as in the Internet map. These 
visualizations have been produced with the tool LANET-VI jsil ] . 



V. CONCLUSIONS 

In summary, we have presented a simple weighted growing network model for the Internet, 
based on evolution, environmental interaction and heterogeneity. The dynamics is driven 
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FIG. 4: Scaling of the number of loops of size 3, 4 and 5 for the model with and without distance 
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FIG. 5: Cumulative distributions of the betweenness centrality (left) and of the number of triangles 
passing by a node (right). 

by two key mechanisms, competition and adaptation, which may be relevant in other self- 
organizing systems. Beyond technical details, many empirical features are nicely reproduced 
but open questions remain, perhaps the more important one being whether the general ideas 
and mechanisms exposed in this work could help us to better understand other complex 
systems. 
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